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“Sex is the queen of problems in evolutionary biology. Perhaps no 
other natural phenomenon has aroused so much interest; certainly 
none has sowed so much confusion.” 


—Graham Bell, 1982 


Abstract 


We consider a recent innovative th eory by Chastain et al. on the role of 
sex in evolution | Chastain et al\. 2014 1. In short, the theory suggests that the 
evolutionary process of gene recombination implements the celebrated mul¬ 
tiplicative weights updates algorithm (MWUA). They prove that the popu¬ 
lation dynamics induced by sexual reproduction can be precisely modeled 
by genes that use MWUA as their learning strategy in a particular coordina¬ 
tion game. The result holds in the environments of weak selection, under the 
assumption that the population frequencies remain a product distribution. 

We revisit the theory, eliminating both the requirement of weak selec¬ 
tion and any assumption on the distribution of the population. Removing 
the assumption of product distributions is crucial, since as we show, this as¬ 
sumption is inconsistent with the population dynamics. We show that the 
marginal allele distributions induced by the population dynamics precisely 
match the marginals induced by a multiplicative weights update algorithm 
in this general setting, thereby affirming and substantially generalizing these 
earlier results. 

We further revise the implications for convergence and utility or fitness 
gu arantee s in coordination games. In contrast to the claim of Chastain et 
al. 1120141, we conclude that the sexual evolutionary dynamics does not entail 
any property of the population distribution, beyond those already implied by 
convergence. 
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1 Introduction 


Connections between the theory of evolution, machine learning and games have 
captured the imagination of researchers for decades. Evolutionary models inspired 
a range of app hcations from genetic algorithms to the design of distr i buted multi¬ 
agent systems I Goldberg and Holland . 19881: Cetnarowicz et al . 1996 : Phelps et al . 
200811 . Within game theory, several solution concepts follow evolutionary pro¬ 


cesses, and some of the most promising dynamics that lead to equilibria i n games 


assum e that players learn the behavior of their opponents OHaighl 1 19751: IValiant , 


200911 . 


A different connection between sex, evolution and machine l earnin g was re¬ 
cently suggested by Chastain, Livnat, Papadimitriou and Vazirani 11201411 . As they 
explain, also referring to Barton and Charlesworth 1 1998ll . sexual reproduction is 
costly for the individual and for the society in terms of time and energy, and often 
breaks successful gene combinations. From the perspective of an individual, sex 
dilutes his or her genes by only transferring half of them to each offspring. Thus 
the question that arises is why sexual repro duction is so common in nature, and 
why is it so successful. Chastain et al. 1201411 suggest that the evolutionary process 
under sexual reproduction effectively implements a celebrated no-regret learning 
algorithm. The structure of their argument is as follows. 

First, they restrict attention to a particular class of fitness landscape where weak 
selection holds. Informally, weak selection means that the fitness difference be¬ 
tween genotypes is bounded by a small constant, i.e., there are no extremely good 
or extremely bad gene combinationsQ Second, they consider the distribution of 
each gene’s alleles as a mixed strategy in a matrix-form game, where there is one 
player for each gene. The game is an identical interest game, where each player 
gets the same utility— thus the joint distribution of alleles corresponds to the mixed 
strategy of each player, and the expected payoff of the game corresponds to the av¬ 
erage fitness level of the population. 

Chastain et al. 120141] provide a correspondence between the sex ual population _ 

dynamics and the multiplicative weights update algorithm (MWUA) ilFittlestone and Warmuth . 


1994j : ICesa-Bianchi et a/.LIl997|] . In particular, they establish a correspondence be¬ 


tween strategies adopted by players in the game that adopt MWUA and the popula¬ 
tion dynamics, under an assumption that the fitness matrix is in the weak selection 
regime, and that the population dynamic retains the structure of a product distri¬ 
bution on alleles. With this correspondence in place, these authors apply the fact 
that using MWUA in a repeated game leads to diminishing regret for each player 


’a gene takes on a particular form, known as allele. By a genotype, or gene combination, we 
refer to a set of alleles- one for each gene. The genotype determines the properties of the creature, 
and hence its fitness in a given environment. 
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to conclude that the regret experienced by genes also diminishes. They interpret 
this result as maximization of a property of the population distribution, namely 
“the sum of the expected cumulative differential fitness over the alleles, plus the 
distribution’s entropy.” 

We believe that such a multiagent abstraction of the evolutionary process can 
contribute much to our understanding, both of evolution and of learning algorithms. 
Interesting ly, the agents in this model are not the creatures in the population, nor 
Dawkins’ IIDawkinsL 1200611 genetic properties (alleles) that compete one another, 
but rather genes that share a mutual cause. 


1.1 Our Contribution 


We show that the main results of Chastain et al. 1120141] can be substantially gen¬ 
eralized. Specifically, we consider fhe fwo sfandard populafion dynamics (where 
recombination acfs before selection (RS), and vice versa (SR)), and show fhaf each 
of fhem precisely describes fhe marginal allele disfribufion under a variafion of fhe 
mulfiplicafive updafes algorifhm fhaf is described for correlafed sfrafegies. This 
correspondence holds for any number of genes/players, any fifness mafrix, any 
recombination rale, and any inilial population frequencies. In parficular, and in 
conlrasl lo Chaslain el ah, we do nol assume weak selection or require fhe popu¬ 
lation disfribufion remains a producf disfribufion (i.e., wilh allele probabilities fhaf 
are independenl), and we allow bolh fhe SR model and fhe RS model. 

We discuss some of fhe implications of fhis correspondence belween Ihese bi¬ 
ological and algorilhmic processes for Iheorefical convergence properties. Under 
weak seleclion, fhe observalion fhaf fhe cumulative regrel of every gene is bounded 
follows immedialely from known convergence resulls, bolh in populafion dynam¬ 
ics and in game Iheory (see relafed work). We show fhaf under fhe SR dynamics, 
every gene still has a bounded cumulalive regrel, wifhouf assuming weak selection 
or a producf disfribufion. 

Our analysis also uncovers whal we view as one technical gap and one c oncep - 
lual gap regarding fhe fine defails in fhe original argumenf of Chaslain el al. 1120141] . 
We believe fhaf due fo fhe far reaching consequences of fhe Iheory if is imporfanl 
fo rectify Ihese delails. Firsl, according fo fhe populafion dynamics fhe population 
frequencies may become highly correlafed (even under weak selection), and Ihus 
if is imporfanl fo avoid fhe assumption on producf disfribufions. Second, fhe prop¬ 
erly fhaf is supposedly maximized by fhe populafion dynamics is already enlailed 
by fhe convergence of fhe process (regardless of whal equilibrium is reached). We 
should Iherefore be careful when inlerprefing if as some nonlrivial advanlage of fhe 
evolutionary process. 
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1.2 Related Work 


The multiplicative weights update algorithm (MWUA) is a general name for a 
broad class of methods in which a decision maker facing uncertainty (or a player 
in a repeated game), updates her strategy. While specifics vary, all variations of 
MWUA increase the probability of actions that have been more successful in pre- 


ishing regref over time 1 

lafari et al. 

2001; 

Blum and Mansour. 

2007; 

Kale. 

2007; 

Cesa-Bianchi et al. 

2001 

'], but does not, in general, converge to a Nash equilibrium 


of the game. For some classes of games better convergence results are known; see 
Section lS^ for details. 

The-fundamental theorem of natural selection, which dates back to Fisher 1 1930l] . 
states that the population dynamics of a single-locus diploid always increases the 
average fitness o f each generation, until it reaches convergence tMulholland and SmithL 
1959 : Li, 1969 1^ The fundamental theorem further relates the rate of increase to 


the variance of fitness in the population. In the general case, for genotypes with 
more than a single locus, the fundamental theorem does not hol d, although con 


structing a counter examp le where a cycle occurs is non-trivial BHastingsl . Il981 


Hofbauer and loossl. 1 198411 . However, convergence of the population dynamics has 


been shown to hold when the fitness landscape has s ome s pecific properties, such 
as weak selection, or weak epistasis BNagylaki et aZ.L 1 199911 rl 

In asexual evolutionary dynamics, every descendent is an exact copy of a single 
parent, with more fit parents producing more offspring (“survival of the fittest”)- 
Regardless of t he number of loci^ asexual dynarn i cs coi ncides with MW UA by 
a single player I Borgers and Sarin . 1997 : HopkinsL 1999ll . Chastain et al. 1 2014 1 
were the first to suggest that a similar correspondence can be established for sexual 
population dynamics. 


2 Definitions 


We follow the definitions of Chastain et al. 1201411 where possible. For m ore de¬ 
tailed explanation of the biological terms and equations, see Burger 120IIP . 


^Roughly, a single-locus means there is only one property that determines fitness, for example 
eye color or length of tail. Multiple loci mean that fitness is determined by a combination of several 
such properties. We explain what are diploids and haploids in the next section. 

^Weak epistasis means that the various genes have separate, nearly-additive contribution to fit¬ 
ness. It is incomparable to weak selection. 


4 



































































Wij 

bi 

b2 

ai 

1 

0.5 

02 

1.5 

1.2 

03 

1.3 

0.8 


Pii 

bi 

b2 

Ol 

0.1 

0.15 

02 

0.1 

0.15 

03 

0.2 

0.3 

xt* 

0.4 

0.6 


0.25 

0.25 

0.5 


ft, 

bi 

b2 

Ol 

0.094 

0.082 

02 

0.158 

0.170 

03 

0.255 

0.242 

xt 

0.507 

0.493 


0.174 

0.328 

0.497 


Table 1: An example of a 2-locus haploid fitness matrix, with n = 3, m = 2. 
There are 6 allele combinations, or genotypes. On the left we give the fitness of 
each combination {at, bj). In the middle, we provide an initial population distribu¬ 
tion, and on the right we provide the distribution after one update step of the SR 
dynamics, for r = 0.5. 


2.1 Population dynamics 

A haploid is a creature that has only one copy of each gene. Each gene has several 
distinct alleles. For example, a gene for eye color can have alleles for black, brown, 
green or blue color. In contrast, people are diploids, and have two copies of each 
gene, one from each parent. 

Under asexual reproduction, an offspring inherits all of its genes from its single 
parent. In the case of sexual reproduction, each parent transfers half of its genes. 
Thus a haploid inherits half of its properties from one parent and half from the 
other parent. To keep the presentation simple we focus on the case of a 2-locus 
haploid. This means that there are two genes, denoted A and B. In the appendix 
we extend the definitions and results to A:-locus haploids, for k > 2. Gene A has 
n possible alleles ai,..., an, and gene B has m possible alleles bi,..., bm- It is 
possible that n ^ m. We denote the set {1,..., n} by [n]. A pair (oj, bj) of alleles 
defines a genotype. 

Let W = {wij)i<n,j<m denote a.fitness matrix. The fitness value Wij € M+ can 
be interpreted as the expected number of offspring of a creature whose genotype is 
(oj, bj). We assume that the fitness matrix is fixed, and does not change throughout 
evolution. See Tabled for an example. 

We denote by = {p\j)i<n,j<m the distribution of the population at time t. 
The average fitness w at time t is written as 

= w{P^) = Y^pljWij. (1) 

ij 

For example, the populations in Table[T]have an average fitness of w{P^) = 1.005 
and w{P^) = 1.1012. Denote by x\ = YljPij ^'^tl yj = YhiPlj the marginal 
frequencies at time t of alleles Oj and bj, respectively. Clearly pjj = x\y^j for all 
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i, j iff P* is a product distribution. In the context of population dynamics, the set of 
product distributions is also called the Wright manifold. For general distributions, 
D\j = — xjyj is called the linkage disequilibrium. 

The selection strength of W is the minimal s s.t. Wij G [1 — s, 1 + s] for all 
i, j. We say that W is in the weak selection regime if s is very small, i.e., all of Wij 
are close to 1. 




Update step In asexual reproduction, every creature of genotype (oj, bj) has in 
expectation Wij offspring, all of which are of genotype {ai,bj). Thus there is 
only selection and no recombination, and the frequencies in the next period are 

In sexual reproduction, every pair of creatures, say of genotypes {ai,bi) and 
{ak,bj) bring offspring who may belong (with equal probabilities) to genotype 
(oj, bi) , (afc, bj) , (oj, bj), or {ak,bi). Thus, in the next generation, a creature of 
genotype (oj, bj) can be the result of combining one parent of genotype (oj, ?) with 
another parent of genotype (?, bj). There are two ways to infer the distribution of 
the next generation, de pending on whether recombina tion occurs before selection 
or vice versa (see, e.g., IIMichalakis and SlatkinLIlQQhll l. We describe each of these 
two ways next. 


Selection before recombination (SR) Summing over all possible matches and 
their frequencies, and normalizing, we get: 


pSR 


E;g[m] Ek€[n]PuWilPijWkj 


In addition, the recombination rate, r € [0,1], determines the part of the 
genome that is being replaced in crossover, so r = 1 means that the entire genome 
is the result of recombination, whereas r = 0 means no recombination occurs, and 
the offspring are genetically identical to one of their parents. Given this, population 
frequencies in the next period are set as: 

= rpfj^ + (1 - r)pfj. (2) 


Recombination before selection (RS) With only recombination, the frequency 
of the genotype (oi, bj) is the product of the respective probabilities in the previous 
generation, i.e., p^ = xlyj. When recombination occurs before selection, we have 
(before normalization): 

Pij^ = Wijpfj = Wijx\y). 
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Taking into account the recombination rate and normalization, we get, 

pit" = + (1 - r)pfj) = ^Wij{v% - rDt), (3) 

where Wij{p\j — rDjj), and Dj^ is the linkage disequilibrium at time t. 

For an example of change in population frequencies, see Tables [TBl We say 
that P* is a stable state under a particular dynamics, if = P*. 


2.2 Identical interest games 

An identical interest game of two players is defined by a payoff matrix G, where 
gij is the payoff of each player if the first plays action i, and the second plays 
action j. A mixed strategy of a player is an independent distribution over her 
actions. The mixed strategies x, y are a Nash equilibrium if no player can switch 
to a strategy that has a strictly higher expected payoff. That is, if for any action 
€ [n], Yj VjPi'j < Yi Y.j XiViQij, and similarly for any j' € [m]. 

Every fitness matrix W induces an identical interest game, where gij = Wij. 
This is a game where each of the two genes selects an allele as an action (or a 
distribution over alleles, as a mixed strategy). A matrix of population frequencies 
P can be thought of as correlated strategies for the players. The expected payoff 
of each player under these strategies is w{P). Given a distribution P, G\p is the 
subgame of G induced by the support of P. That is, the subgame where action i is 
allowed iff pij > 0 for some j, and likewise for action j. 


2.3 Multiplicative updates algorithms 


Suppose that two players play a game G (not necessarily identical interest) re¬ 
peatedly. Each player observes the strate^ of her opponent in each turn, and 
can change her own strategy accordinglyj^ne prominent approach is to grad¬ 
ually put more weight (i.e., probability) on pure actions that were good in the 
previous steps. Many variations of the multiplicative weights update algorithm 
(MW UA) are built upon t his idea, and some have been applied to s t rategi c set¬ 
tings I Blum and Mansoui . 2007 : Marden et al. . 2009 : Kleinberg et al. . 2009 1. We 
follow the variation used by Chastain et al. 1201411 . This variation is equivalent 
to the Polynomial Weights (PW) algorithm ll2007ll . under the assu mption that the 
utility of all actions {ai)i<n is observed after each period (see Kale 1120070 . p. 10). 


We assume that the player observes the full joint distribution and can thus infer the (ex¬ 
pected) utility of every action ai at time t. 
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pU (SR) 

bi 

^2 

y' 

pU (RS) 

bi 

^2 

Oi 

0.088 

0.085 

0.174 

Ol 

0.099 

0.074 

02 

0.167 

0.162 

0.328 

02 

0.149 

0.179 

03 

0.252 

0.245 

0.497 

03 

0.258 

0.239 

X^ 

0.507 

0.493 


X^ 

0.507 

0.493 


Table 2: Population frequencies P^, i.e. after one update step, for r = 1. On the 
left, we provide frequencies when selection occurs before recombination, whereas 
on the right recombination precedes selection. The updated marginal frequencies 
of alleles are the same in both cases. 


Pf (SR) 

bi 

^2 

Ol 

0.094 

0.082 

02 

0.158 

0.170 

03 

0.255 

0.242 

X^ 

0.507 

0.493 


0.174 

0.328 

0.497 


Pf (RS) 

bi 

^2 

Ol 

0.099 

0.074 

02 

0.149 

0.179 

03 

0.258 

0.239 

X^ 

0.507 

0.493 


0.174 

0.328 

0.497 


Table 3: Population frequencies for r = 0.5. Under RS we get the same 
frequencies as with r = 1 (and in fact any other value of r). This is because when 
P^ is a product distribution there is no effect of recombination under RS. 


Polynomial Weights We use the term PW to distinguish this from other varia¬ 
tions of MWUA. For any e > 0, the e-PW algorithm for a single decision maker 
is defined as follows. Suppose first that in time t, the player uses strategy Let 
g\ be the utility to the player when playing some pure action i G [n]. Accord¬ 
ing to the e-PW algorithm, the strategy of the player in the next step would be 
= x*(f)(l -|- eg\), where = stands for “proportional to” (we need to nor¬ 
malize, since has to be a valid distribution). A special case of the algorithm 
is the limit case e —> oo, where = x^{i)g\\ i.e., the probability of play¬ 

ing an action increases proportionally to its expected performance in the previous 
round. Unless specified ofherwise we assume fhis limif case, which we refer fo as 
fhe parameter-free PW. 

PW in Games The fundamenfal feafure of fhe PW algorifhm is fhaf fhe proba- 
bilify of playing acfion changes proportionally fo fhe expecfed ufilify of action 

0-2 - 

Consider 2-player game G, where gij is fhe ufilify (of bofh players if G is an 
identical inferesf game) from fhe join! action (a^, bj). In fhe confexf of a game, we 
can fhink of af leasf fwo differenl inferprefafions of fhe utility of playing Oj, derived 
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Pf (SR) 

bi 

^2 


Pf (RS) 

bi 

^2 

at 

0.079 

0.042 

0.122 

Ol 

0.09 

0.034 

02 

0.228 

0.172 

0.401 

02 

0.203 

0.195 

03 

0.294 

0.183 

0.477 

O 3 

0.305 

0.173 


0.602 

0.398 



0.598 

0.402 


Table 4: Population frequencies i.e. after two update steps for r = 0.5. The 
marginal distributions are no longer the same, since under RS is not a product 
distribution. 


from the joint distribution P. For this, let yj{ai) = P^{bj\ai), i.e., the probability 
that player 2 plays bj given that player 1 plays ai, according to the distribution P*. 
The two interpretations we have in mind are: 


Set p* = '^j yj{ai)gij. This is the expected utility that player 1 would 
get for playing a* in round t. This definition is consi stent w ith common 
interpretation of expected utility in games (e.g., in Kale i2007n . Sec. 2.3.1). 


Set p- = Ylfjy^dij- This is the expected utility that player 1 will get in 
the next round for playing Oj if player 2 will select an action independently 
according to her current marginal probabilities. Thus each agent updates 
her strategy as if the strategies are independent, and ignoring any observed 
cor relatio n. This definition results in the PW algorithm used in Chastain et 
al. 120141] . 


The above definitions require some discussion. While the traditional assump¬ 
tion is that each player only observes a sample from the joint distribution at each 
round, and updates the strategy based on the empirical distribution, strategy up¬ 
dates can also be performed in the same way when the player observes the joint 
distribution at round t, even if it is hard to imagine such a case occurs in practice. 

Intuitively, under the first interpretation, the player considers correlation, 
whereas under the second the player assumes independence, and then uses the 
marginals to compute the expected utility. E.g suppose that players play Rock- 
Paper-Scissors, and the history is 100 repetitions of the sequence [(R,P) (P,S) 
(S,S)]. Then under the first interpretation the best action for agent 1 is S (since it 
leads to the best expected utility); whereas under the second interpretation the best 
action for agent 1 is R, since agent 2 is more likely to play sjl Clearly, when P* is a 

^We can also think of the two approaches as the two sides of Newcomb’s paradox iNozic iEm]: 
the player observes a correlation, even though deciding on a strategy cannot change the expected 
utility of each action. Thus it is not obvious whether the observed correlation should be considered 
in the strategy update. 
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product distribution (as in Chastain et al. 120141] ') then gj = g^., and the algorithms 
coincide. We can also combine the two interpretations to induce new algorithms. 
We thus define the PW^") algorithm (either Parameter-free PW*^") or e-PW(“)), 
where the probability of playing a* is updated according to = agl+{l—a)g^.. 


Exponential Weights The Hedge algorithm | Freund and Schapire , 1995 , 1999l] 
is another variation of MWUA that is very similar to PW. The difference is that 
the weight of action i in each step changes by a factor that is exponential in the 
utility, rather than linear. That is, = x*(z)(l + e)^^. For negligible e > 0, 

e-Hedge and e-PW are essentially the same, but for large e they may behave quite 
differently. 


3 Analysis of the SR dynamics 


In this section we prove that the SR population dynamics of marginal allele fre¬ 
quencies coincide precisely with the multiplicative updates dynam ics in the corre¬ 
sponding game. This extends Theorem 4 in Chastain et al. 120141] (SI text), in that 
it holds without weak selection or the assumption of product distributions through 
multiple iterations. We also generalize the proposition to hold for any number of 
loci/players in Appendix lAl 

Proposition 1. Let W be any fitness matrix, and consider the game G where gij = 
Wij. Then under the SR population dynamics, for any distribution and any 


r € [0,1], wc have 


Proof. By the SR population dynamics (Eq. (|2])), 


= 




= r 


^ EiEkPgWiipijWkj ^ p\jWij 

( 777^ 1 2 A—/ Jnt 


J 


J 


1 f 1 f /I \ Pij^tj 

= E + (1 - 0 E 

I k j j 
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Then since Ylk Ylj Pkj'^i^j average fitness at time t, 


= r 


PijWij 




= r 


PijWij 






thus the recombination factor r does not play a direct role in the new marginal 
under the SR dynamics. It does have an indirect role though, since it affects the 
correlation, and thus the marginal distribution at the next generation t + 2. □ 


Theorem 4 in Chastain et al. 1201411 follows as a special case when P* is a 
product distribution. By repeatedly applying Proposition [H we get the following 
result, which holds for any value of r. 


Corollary 1. Let W be a fitness matrix, be any distribution. Suppose that 
is attained from P* by the SR population dynamics, and that are 

attained from P* by players using the parameter-free algorithm in the game 

G = W. Then for all t > 0 and any i, x\ = Ylj Pij- 

It is important to note that the marginal distributions x*, y* do not determine P* 
completely. Thus the PW algorithm specifies the strategy of each player (regardless 
of r), but not how these strategies are correlated. 


4 Analysis of the RS dynamics 


Turni ng to th e RS population dynamics, our starting point is Lemma 3 in Chastain 
et al. 11201411 (SI text), which states that p^j^ = ^Wijx\y^j (under the assump¬ 
tion that P* is a product distribution). We establish a similar property for general 
distributions. We use the fact that for any fitness matrix W and distribution P*, 


p\P = ^{rwijxjy^j + (1 - r)wijplj). 


This follows immediately from the definition (Eq. (I3]l). Recall that = 
(j'9i + (1 “ derive an alternative extension of Theorem 4 in Chastain 

et al. 1 2014 1 (SI text) for the RS dynamics. 
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Proposition 2. Let W be any fitness matrix, and consider the game G where gij = 
Wij. Then under the RS population dynamics, for any distribution P* and any 


r G [0,1], 


Proof. By the RS population dynamics (Eq. (I3]l), 


+ (1 - r)wijp\j) 

j 3 

= ^ '^{rwijx\y] + (1 - r)wijx\y]{ai)) 

3 

= + (1 “ 0 Wijyjiai)). 

3 3 


Finally, by the definitions of gj and g^., 


r.i+1 — 


w 


= ^Xi [rg^ + (1 - r)g.) = ^x^g^ 




W 


In contrast to the SR dynamics, here r appears explicitly in the marginal distri¬ 
bution □ 


So we get that under RS the marginal frequency of allele a* is updated ac¬ 
cording to an expected utility that takes only part of the correlation into account. 
This part is proportional to the recombination rate r. We get a similar result to 
Corollary [T] 

Corollary 2. Let W be a fitness matrix, be any distribution. Suppose that 
is attained from P* by the RS population dynamics, and that are 

attained from P* by players using the parameter-free algorithm in the game 

G = W. Then for all t > 0 and any i, x\ = Yhj Pij- 


5 Convergence and Equilibrium 

In this section, we consider implications of the general theory on the correspon¬ 
dence between sexual population dynamics and multiplicative-weights algorithms 
on convergence properties. 
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5.1 Diminishing regret 


In Chastain et al. 120141] (Sections 3 and 4 of the SI text), the authors apply stan¬ 
dard properties of MWUA to show that the cumulative external regret of each gene 
is bounded (Corollary 5 there). In other words, if in retrospect gene 1 would have 
“played” some fixed allele a* throughout the game, the cumulative fitness (sum¬ 
ming over all iterations) would not have been much better. This result leans only 
on the properties of the algorithm, and does not require the independence of strate¬ 
gies. Thus we will write a similar regret bound explicitly in our more general 
model. 

Consider any fitness matrix W whose selection strength is s. Let AFf = 
^Yl"t=i9\ average fitness in retrospect if allele a* had been used through¬ 
out the game), and AFgj^ = ^ (’^^e actual average fitness under the SR 

dynamics). 

Corollary 3. For any T € N, any s € (0, ^) and all i < n, AFg^ > AF^ — — 

ln(n)/r. 


Proof. Set Aij = 


Wii-l 


m, 


it) 


m 


= Y^jl/j{ai)Aij, and e = s. Note that Aij and 
are in the range [—1,1]. Intuitively, is the expected profit of player 1 


from playing ac tion a,; in th e “differential game 


w-i 


Theorem 3 I Kale , 2007 1 states that under the e-PW algorithrrH 


(4) 


t=i 


t=i 


t=i 


where xf'^ is the probability that the decision maker chose action a* in iteration t 
(thus xf'^ = xj by our notation). 

The proof follows directly from the theorem. Observe that mf'^ = {g^. — l)/s, 
and 


rit) . 


= 


E 

i 




i j 

W* - 1 


yMi)^ 




t Wjj 1 

'O' 




®Kale I 2 OO 7 I 1 analyzes the Exponential Weights algorithm hut a slight modification of the analysis 
works for PW. 
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Thus 


AFf = 7^ + 1 = 1 + "^1 


it) 


Z—/ —i jp 

t=l t=l 

•^^’Jr ==E®‘ = I E(i+o"''*’ ■ p"’) 

^ t=i t=i 


t=i 


l + elj^mW.pW. 


(replacing e = s) 


t=i 


Plugging in Eq. Q, 

AFj^>l + e;^(^ 




m- ' - e^(m, 
t=i t=i 

T T 


>1+4 E”!*’-El 


ln(n) 


u=i 

r 


4=1 


1 + ^ mf I - - 7^ ln(n) 


4=1 


= AF/ - s^-\n{n)/T, 


as required. 


□ 


We highlight that the bound on the regret of each player (or gene) stated in this 
result depends only on the algorithm used by the agent, and not on the strategies 
of other agents. These may be independent, correlated, or even chosen in an ad¬ 
versarial manner. For simplicity we present the proof for two players/genes. The 
extension to any number of players is immediate because the theorem bounds the 
regret of each agent separately. 

By taking s to zero and T to infinity, we get that the a verage cumulative regret 


AFJj^ — AFp tends to zero, as stated in Chastain et al. Il2014ll (they use a more 


refined form of fhe inequalify fhaf confains fhe enfropy of P* rafher fhan Inn). 

For fhe RS dynamics we gef somefhing similar buf not quite the same. Since 

t fr) 

gp is not exactly the expected fitness at time t, we get that cumulative regret 
is diminishing but not w.r.t. the actual average fitness nJ*. That is, the regret is 
determined as if the actual expected fitness of action Oj is gp . More formally, 


we get a variation of Corollary [3j where AFp = y X]^i ~ 

T 2^t=l l^i 
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5.2 Convergence under weak selection 

In normal-form games, following strategies with bounded or diminishing regret 
does not, in general, guarantee convergence to a fixed point in the strategy spaced 
For some classes of games though, much more is known. For example, if all play¬ 
ers in a potential game apply the e-Hedge algorithm, with a suffi ciently small e. 


then P* converges to a Nash equilibrium I Kleinberg et al . . 2009ll . and almost al 


ways t o a pure Nash equilibri um. Similar results have been shown for concave 


games IIEven-Dar et alx 120091] . Since identical-interest games are both potential 


games and concave games, and since for small e we have that Hedge and PW are 
essentially the same, these results apply to our setting. This means that under each 
ofRS and SR dynamics, the population converges to a stable state, for a sufficiently 
low selection strength s. 

This implication is not new, and has been show n inde pendently in the evolu¬ 
tionary biology literature. Indeed, Nagylaki et al. 1199911 prove that under weak 
selection, the population dynamics converges to a point distribution from any ini¬ 
tial state (that is, to a Nash equilibrium of the subgame induced by the support of 
the initial distribution). Note that under weak selection. Corollary [3]becomes triv¬ 
ial: once in a pure Nash equilibrium (a**, bj*) (say at time t*), the optimal action 
of agent 1 is to keep playing a*. Thus for any t > t*,w^ = g^.^, and the cumulative 
regret does not increase further. 


6 Discussion 


Chastain et al. 1201411 extend an interesting connection between evolution, learning, 
and games from asexual reproduction (i.e., replica tor dy namics) to sexual repro¬ 
duction. The proof of Theorem 4 in Chastain et al. 1 2014ll gives a formal meaning 
to this connection. Namely, that the strategy update of each player who is using 
PW(i) in the fitness game, coincides with the change in allele frequencies of the 
corresponding gene (under weak selection and product distributions). This relation 
is generalized in our Propositions [T] and |2l since for product distributions PW(“) is 
the same for all a. 

Chastain et al. 1 2014ll also claim something stronger: that the population dy¬ 
namics is precisely the PW dynamics. The natural formal interpretation of this 
conclusion would be in the spirit of our Corollary [T] i.e., that allele distributions 
and players’ strategies would coincide after any number of steps. In our case we 


^It is k nown that the average joint distribution over all iterations converges to the set of correlated 
equilibria iBlum and Mansoun. IiOOTIi . This is less relevant to us because we are interested in the 
limit of Ph 
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prove this for the marginal probabilities. But as we have discussed, their conclu¬ 
sion only follows from their Theorem 4 under the assumption that P* remains a 
product distribution. This is counterfactual, in that is in general not a product 
distribution (under their assumptions on the dynamic process), and thus the next 
step of the algorithm and the population dynamics would not be precisely 

equivalent but only approximately equivalent. The approximation becomes less ac¬ 
curate in each step, and in fact even under weak selection the population dynamics 
may diverge from the Wright manifold, or converge to a different outcome than the 
algorithm, as we show in Appendix |B] Thus while the intuition of Chatain 
et al. 120141] was correct, the only way to rectify their analysis is via the more gen¬ 
eral proof without assumptions on the selection strength (even if we accept weak 
selection as biologically plausible). 


What does evolution maximize? In Chastain et al. 120141] (Corollary 5 in the SI 
text), it is also shown that under weak selection, “population genetics is tantamount 
to each gene optimizing at generation t a quantity equal to the cumulative expected 
fitness over all generations up to f,” (plus the entropy). While this is technically 
correct (our Cor. |3] is a restatement of this result), we feel that an unwary reader 
might reach the wrong impression, that this is a mathematical explanation of some 
guara ntee o n the average fitness of the population. We thus emphasize that the 
both 1120141] and our paper establish only the property of diminishing regret, which 
is already implied when converges to a Nash equilibrium. Players never have 
regret in a Nash equilibrium, and thus the cumulative regret tends to zero after the 
equilibrium is played sufficiently many times. 

Thus the population dynamics cannot provide any guarantees on fitness (or on 
any other property) that are not already implied by an arbitrary Nash equilibrium. 
In the evolutionary context this means that the outcome can be as bad as the worst 
local maximum of the fitness matrix. Also note that convergence is to a point 
distribution (a pure Nash equilibrium, see Sec. 15.21) . and thus its entropy is 0 and 
irrelevant for the maximization claim. 


Convergence without weak selection It is an open question as to what other 
natural conditions are sufficient to guarantee convergence of sexual population dy¬ 
namics. We have conducted simulations that show that convergence to a pure equi¬ 
librium occurs w.h.p. even without weak selection, and in fact the convergence 
speed increases as selection strength s (or the learning rate e) grows. At the same 
time, the quality of the solution/population reached seems to be the same regardless 
of the selection strength/learning rate (we measured quality as the fitness of the lo¬ 
cal maximum the dynamics converged to, normalized w.r.t. the global maximum). 
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Convergence speed 



Equilibrium quality 



Q 


Figure 1: On the left, F{T) is the fraction of instances that converge (reach 
uiaxij plj > 1 — 10“®) within at most T generations. On the right, F{Q) is 
the fraction of instances that converge to an equilibrium with quality q at most 
Q, where q = i (the ratio between average fitness in the equilibrium that 

was reached, and the optimal fitness). Results are for random fitness matrices of 
size 8x5, where Wij is sampled uniformly at random from [1 — s, 1 + s]. Note 
that most instances do not converge to the optimal outcome. 


Both trends are visible in Figure [Ufor 8x5 matrices, based on 1000 instances for 
each plot. Similar results are obtained with other sizes of matrices. 

However, it is known that the sexual population dynamics on general fitness 
matrices (even on 4 x 4 matrices) does not always conyerge, and explicit example s 
have been constructed jHastingsl Il98ll : lAkinL 1 19831 : iHofbauer and loossl 1198411 . 
By Corollaries [U and O convergence of the PW algorithm to a pure Nash equi¬ 
librium, and convergence of the population dynamics to a point distribution is the 
same thing. Thus characterizing the conditions under which these dynamics con¬ 
verge will answer two questions at once. 


Conclusions We formally describe a precise connection between population dy¬ 
namics and the multiplicative weights update algorithm. For this connection, we 
adopt a version of MWUA that takes the correlation of player strategies into ac¬ 
count, while still supporting no regret claims. More specifically, fwo differenl 
variafions of fhe Polynomial Weighfs subclass of MWUA each coincide wifh fhe 
marginal allele disfribufion under fhe fwo common sexual populafion dynamics 
(SR and RS). If is imporfanf fo nofe fhaf fhe correspondence fhaf we esfablish is 
befween fhe marginal frequencies/probabililies, rafher fhan fhe full join! disfribu¬ 
fion. 

Nofably, weak selection is nof required fo make fhese connections.Yef, if is 
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known that weak selection provides an additional guaran tee, which is that the dy¬ 
namics converge to a particular population distribution iNagvlaki 1 199311 . It re¬ 
mains an open question to understand what other conditions are sufficient for con¬ 
vergence of the PW algorithm in identical interest games. Solving this question 
will also uncover more cases where the fundamental theorem of natural selection 
applies. 
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A Extension to Multiple Genes 


A /c-locus haploid has k genes, eaeh of whieh is inherited from one of its two 
parents. In this appendix we show how to extend our main results to a haploid with 
k > 2 loei. 

A.l Notation 

We eonsider a haploid with k loei, eaeh with rij alleles, j < k. We denote K = 
[k] = {1,2,... ,k}, and use J to denote subsets of K. 

A genotype is defined hy a veetor of indiees ik = (a, • • •, ffc) for ij G [rij]. 
We denote by X( J) the set of all Oje J % partial index veetors of the form {ij) 

We sometimes eoneatenate two or more partial genotypes: ij,, = for 

some ij € 2i(J), ij, G We use — J to denote K\J. 

The fitness of a genotype ik is denoted by w\j^. W = is ealled 

the fitness landscape (whieh is a matrix for k = 2). Similarly, the population 
frequeney of genotype ik at time t is denoted by p\^, and P* = {p\^)ij^£X{K)- 
The average fitness at time t is 

ni n/j 

= X] • • • X] pL^^k- (5) 

h=l *fc=l 

Let x* be the marginal distribution of loeus j G iT at time t, i.e., for all ij G [uj], 

4= E 4,i-r 

i-j£l(-j) 

In the speeial ease of 2 loei, K = {1, 2}, and x\^, x\^ eorrespond to x\, y\ as used 
in the main text. We also define fhe marginal fitness of allele ij G [uj] af lime t as 
Ihe average filness of all fhe populalion wifh allele ij. Thai is, 

4 = E (6) 

i-j<=I(-j) 


A.2 RS dynamics 

Aeeording lo fhe mulli-dimensional extension of 

n 4 + “ r)wi^p\^). (7) 

^ j&K 
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Given a game G and a joint distribution P*, let g\. = 

That is, the expeeted utility of playing 
Qi- when every agent j' independently plays x*,. 

Lemma 1. Let W be any fitness matrix, and consider the game G = W. Then 
under the RS population dynamics, for any distribution P* and any r € [0,1], 


f+i _ 
i 


W 


■R 


t i,(^) 


Proof. 


xA = 


E rfi 


i-jGX(-j) 


t+1 


(By definition) 


.N ^ IN 




i'6p-|) 


(By Eq. ©) 


^4 r ^ n 


i_,ex(-i) i'ei(-l) 


+ (1-0 Y1 

i-j£X{-j) 

= ^4 {rgj + (1 - 


□ 


A.3 SR dynamics 


The SR population dynamics under sexual reproduction is defined as: 


pA = r 


1 


E 5eE 

JQK 


i\2 


(tu*) 


+ ( 8 ) 


w 


We can think of J as the set of genes that are inherited from the “first” parent, and 
—J as the set of genes that are inherited form the “second” parent. Thus a possible 
genotype of the offspring of parents with genotypes i, i' is (ij, 


Lemma 2. Let W be any fitness landscape, then under the SR dynamics. 



1 

IZJ* 


i-jGl(-j) 
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Proof. Let J* = J U {j}, -J* = K\{JVJ {j}). 


^3 




t+_i 


(By definition) 


=r 


i-jex(-i) 

i_jeX(-i) i)fGl(it') JQK ^ > 

+ (ByEq. dUl) 

w 

^Y Y Y 

JCKi_^eX{-j) i'j&X{J) ^ ’ 

i'_j&X{-J) 

Pii Wi.i_. 

‘'3 1 ^ — 3 ^ 


+ (l-r) y; 

i-j£X{-j) 




=rC + (1 — r)D 
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We first analyze part C: 


C-37SW E E E 


^ ’ JQ-{3} ijGl(J) \'j(iI(J) 


+ E E 


E Ph* 


JQ-{ 3 } ijeX(J) i'^*ex(j*) 


E f 

2^{w^Y Z-,- 1 


E E E E plx.j'^h^’.j 

_ J* ex(-j*) i'jGX( j) ijex( J) i'_^eX(-J) 


ijGl(J) i'_^*GX(-J*) i'j*GX(J*) i_j*Gl(-J*) 


^^E f 

2fc(y^t)2 Z-,- 1 

^ ^ JQ-{3] \i 


E E E p\'k^-^'k 

_j*eX(-J*)i'jGX(J) i'^GXW 


+ E E E 

ijeX(J)i'_^*GX(-J*) 


^^E f 

^ ^ JQ-{ 3 } Vi 


E E p\'j,i-j^-^'j^-j 


_^*ex(-j*)i'GX(J) 


rt; 




ijeX(J) i'_^,GX(-J*) 


(By Eq. 


1 1 


JC-{i} Vi_,GX(-j) i-iGX(-j) 

pV i i ■ 1.1 1 


i-jGX(-j) 


-/c-b) 


E y 




i-jGX(-i) 


□ 
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Finally, 




1 P' • w. 

rC+{l-r)D = r—^ ^ j_, + (i _ r) ^ 


^ V 


i-jGl{-j) 


i-jGl{-j) 






i-j€X{-j) 


As in the case of two loci, we use the lemma to show that under the SR dynam¬ 
ics, the marginal distribution of gene j € K develops as if gene j is applying the 
PW algorithm. 

Given a game G and a joint distribution P*, let < 7 * = 

—ij 

Yli_ ex{-j) That is, the expected utility to j of using the 

pure action at time t. 


Proposition 3. Let W be any fitness matrix, and consider the game G = W. Then 
under the SR population dynamics, for any distribution P* and any r G [0,1], we 
have 


h ujt htLi 


(9) 


Proof. Applying Lemma |2j 




j.i +1 — 

'^3 \ij ^ ^ ^ ^ 


X,- AVj. i . 

^3 '■3X-3 


i-jGp-i) 


E = ^ 4 p: 


(By Eq. ©) 


i_jGX(-j) 


□ 

The multi-dimensional extension of Corollary [T] follows in the same way from 
Proposition |3] 


B PW and product distributions 


Consider the “uncorre lated” version of the PW algorithm, which is the one used in 
I Chastain et al. . 2014 1: 


E 

j 


( 10 ) 
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In iChastain et alx 1201411 there is no distinction between RS and SR. The for¬ 
mal definition that they use coinci des with RS (p. 1 of t he SI text), whereas in an 
earlier draft they used SR (p.5 in IChastain et alX. l2013ll ). In a private communi¬ 
cation the authors clarified fhaf fhey use SR and RS inferchangeably, since under 
weak selection fhey are very close. 


Divergence from the Wright manifold Chastain et al. 11201411 justify the as - 
sumption that is a product distribution by quoting the result of Nagylaki 11199311 


which states for any process {P^)t there is a “corresponding process” on the Wright 
manifold, which converges to the same point. However the authors do not explain 
why this corresponding process is the one they assume in their paper. To further 
stress th is point, we will show that the population dynamics and the PW algorithm 
used in iChastain et alX. 1201411 can significantly differ (we saw empirically that the 
marginals also differ significantly). 

Consider the 2x2 fitness matrix where wu = 1 -|- s, and wij = 1 otherwise. 
For simplicity assume first that r = 0 (thus SR and RS are the same). Suppose 
that po is the uniform distribution (that is on the Wright manifold). While the 
population dynamics will eventually converge to pn = 1, there is some t s.t. P* 

1/8 1/8 )■ ^ ylL > i = ^(1) fo"" 

(.d norm and regardless of the selection strength s. The gap is still large for other 
small constant values of r (including when s <C r). Thus the population dynamics 
can get very far from the Wright manifold. 

In the example above both processes will converge to the same outcome (pn = 
1 ), but at different rates. 


is approximately 


Difference in convergence One can also construct examples that converge 
to different outcomes. For example, for s = 0.01 consider W = 

. If the initial distribution is = y* = (0.499,0.501), 

then the (independent) PW dynamics converges to p 22 = whereas for r = 0.5 
the SR dynamics converges to pil = 1. Such examples can be constructed for any 
values of s > 0 and r < 1. 


/ 1.01 1 
V 1 1.0099603 
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